mposed of 54 Versicolor flowers and five Virginica flowers.

e based on this node (in fact a leaf), the probability of classifying

to be the Versicolor class was 91% and to be the Virginica class

The other subspace was composed 46 Virginica flowers and one

or flower. On this leaf node, the probability of classifying a flower

Virginica class was 98% and to be the Versicolor class was 2%.

0 100 setosa (0.33 0.33 0.33)

l.Length< 2.45 50 0 setosa (1 0 0) *

l.Length>=2.45 100 50 versicolor (0 0.5 0.5)

etal.Width< 1.75 54 5 versicolor (0 0.91 0.09) *

etal.Width>=1.75 46 1 virginica (0 0.02 0.98) *

he C50 algorithm

est version of decision tree was Iterative Dichotomiser 3 (ID3),

as developed by Ross Quinlan [Quinlan, 1986]. ID3 was updated

[Quinlan, 1993]. C5.0 (R package C50) is the most recently

version. The basic measurement used in this algorithm for

ly partitioning a data space is the information gain. One of the

ures of C50 is its wonderful graphical presentation. The R

of C50 is shown below

C5.0(formula,data,trials)

nstance, Figure 3.41 shows an example of a C50 tree generated

is data.

Fig. 3.41. An unpruned C50 tree constructed for the Iris data.